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HOW MANY ENTRIES OF A TYPICAL ORTHOGONAL MATRIX 
CAN BE APPROXIMATED BY INDEPENDENT NORMALS? 1 

By Tiefeng Jiang 
University of Minnesota 

We solve an open problem of Diaconis that asks what are the 
largest orders of p n and q n such that Z„, the p„ x q n upper left 
block of a random matrix T n which is uniformly distributed on the 
orthogonal group O(n), can be approximated by independent stan- 
dard normals? This problem is solved by two different approximation 
methods. 

First, we show that the variation distance between the joint dis- 
tribution of entries of Z n and that of p n q n independent standard nor- 
mals goes to zero provided p n — o(y / n ) and q n = o(y/n). We also show 
that the above variation distance does not go to zero if p n = [x^/n] 
and q n — [yy/ri] for any positive numbers x and y. This says that 
the largest orders of p„ and q n are o(n 1/2 ) in the sense of the above 
approximation. 

Second, suppose T n = (7ij)nxn is generated by performing the 
Gram-Schmidt algorithm on the columns of Y« = (yij)nxn, where 
{Vij'i 1 < i, J ' < n } ar e i.i.d. standard normals. We show that e„(m) := 
maxi<i< n .i<j< m \y/n- jij — yij\ goes to zero in probability as long as 
m = m n — o(n/ log n). We also prove that e„(m n ) — > in prob- 
ability when m n = [na/logn] for any a > 0. This says that m n = 
o(n/ log n) is the largest order such that the entries of the first m n 
columns of T n can be approximated simultaneously by independent 
standard normals. 

1. Introduction. Let T n = ("fij) be a random orthogonal matrix which is 
uniformly distributed on the orthogonal group 0(n). Let Z n be the p n x q n 
upper left block of T n , where p n and q n are two positive integers. The open 
problem in Section 6.3 from [10] is as follows: what are the largest orders 
of p n and q n such that the variation distance between the joint distribution 
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of the entries of Z n and that of p n q n independent standard normals goes 
to zero as n — > oo. We answer this question here. Before stating the results 
formally, let us first review some history of this problem. 

In studying "Equivalence of Ensembles" in statistical mechanics, Borel [5] 
showed that 



as n — ► oo for any real number x. For more information about this formula, 
one is referred to [27], page 197 in [26], page 412 in [23], page 342 in [4], [11] 
and [24, 25]. 

Similar results for fixed m are derived through Brownian motion by Gal- 
lardo [16] and Yor [32]. Let ■y 1 be the first column of T n . Stam [30] proved 
that d m , the variation distance between the distribution of the first m coor- 
dinates of "fi and the distribution of m independent standard normals, goes 
to zero provided m = o(y / n) as n — > oo. He applied this result to a geometric 
probability problem. 

In studying a finite representation theorem of the de Finetti type, Diaconis 
and Freedman [11] showed that the above d m goes to zero as n — > oo provided 
m = o(n). On the other hand, in studying a de Finetti-type theorem on a 
finite sequence of orthogonal invariant random vectors, Diaconis, Eaton and 
Lauritzen [14] proved the following. 

Theorem A.l. For each n > 1, let Z n be the p n x q n upper left block of 
a random matrix T n which is uniformly distributed on the orthogonal group 
0(n). Let also 5 n be the variation distance between the distribution of the 
p n q n entries of Z n and the joint distribution of p n qn independent standard 
normals. Then 5 n — > if p n = o(n a ) and q n = o(n a ) for a = 1/3. 

Since the publication of [14], there have been various speculations on 
the maximum value a to make the variation distance go to zero. Here are 
three major ones: (a) p n = 0(n 1//3 ) and q n = Ofa 1 ^ 3 ); (b) p n = o(n 1 / 2 ) and 
q n = o(n 1 / 2 ); (c) p n = o(n) and q n = o(n). Recently Collins [7] showed that 
the variation distance in Theorem A.l goes to zero when p n = O^i 1 / 3 ) and 



Attempts to improve on the orders of p n and q n are partly motivated by 
the following reasons. First, it is well known that the above Y n is close to T' n , 
an n x n matrix with independent normals as entries. Mathematically, it is 
interesting to know in what sense they are close. Diaconis and Shahshahani 
[12], Diaconis and Evans [13] and Rains [28] characterized relationships be- 
tween the traces of T n and those of T' n in terms of expectations; Johansson 
[20] obtained the speed of convergence of traces of T n to a normal ran- 
dom variable; D'Aristotile, Diaconis and Newman [8] showed that the linear 



(1.1) 




q n = 0(n 1 / 3 ). 
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combination of entries of T n also converges weakly to a normal distribution. 
Second, improving the orders of p n and q n has a lot of applications; see [14] 
and [19]. In the last paper Jiang also proved the following coupling result. 

Theorem A. 2. For each n > 2, there exists matrices T n = (jij)i<i j< n 
and T' n = ( r y' i j)i<ij< n whose 2n 2 elements are random variables defined on 
the same probability space such that: 

(i) the law of T n is the normalized Haar measure on the orthogonal group 

(ii) 1 — h J — n } are independent standard normals; 

(hi) set e n (m) = maxi<i< n> i<j< m {^/njij - for m = 1,2, ... ,n. Then 

£n("^n) — ► in probability, 
as n — > oo provided m n = o(n/(logn) 2 ). 

It says that n 2 /(logn) 2 elements of Y n can be approximated by the cor- 
responding elements of T' n in terms of convergence in probability, which is 
weaker than the convergence in variation norm. 

This theorem highlights the interest in improving the orders of p n and 
q n . It seems to suggest that Theorem A.l holds for much larger p n and q n . 
This is why people conjectured that the maximum orders of p n and q n are 
o(n). At the same time it would be interesting to know the largest order of 
m n such that Theorem A. 2 holds. 

In this paper we prove that the maximum value of a as in Theorem 
A.l is actually 1/2, and the largest order of m n such that e n (m n ) — > in 
probability is o(n/ log n), where e n (m n ) is as in Theorem A. 2. To state our 
results formally, let us recall the definition of variation distance first. 

Let fj, and v be two probability measures on (M. m ,B), where B is the Borel 
cr-algebra. The variation distance between \i and v, denoted by — z/||, is 
equal to 

(1.2) ||/i- v\ =2 • sup \n(A) - v(A)\ = / \f(x)-g(x)\dxidx 2 ---dx m , 

provided fi and v have density functions fix) and g(x) with respect to 
the Lebsegue measure, respectively. For each n > 1, suppose that Z n is the 
p n x q n upper left block of a random matrix T n which is uniformly dis- 
tributed on the orthogonal group 0(n). Let G n be the joint distribution of 
p n q n independent standard normals. We use C{\/nZ n ) to represent the joint 
probability distribution of the p n qn random entries of ^fnZ n - It is not diffi- 
cult to see that \\C{y/nZ n ) — G n \\ is nondecreasing in p n and q n , respectively. 

Theorem 1. Ifp n = o(y/n) andq n = o(y/n) asn^oo, then 
lim \\C(V^Z n )-Gn\\=0. 
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As usual, the notation [a] stands for the integer part of a positive integer 

a. 

Theorem 2. Let x > and y > be two numbers and p n = [xn 1 / 2 ] and 
In = [y nl l 2 \ ■ Then 

liminf||£(\/n-^n) - G n \\ > 4>(x,y) > 0, 

U — >oo 

2 2 

where 4>(x, y) := E\ exp(— + — 1| G (0, 1) and £ is a standard normal. 

One can see that 0(0, 0) = 0, which roughly reflects the flavor of Theo- 
rem 1. This is rigorously true if the conclusion in Theorem 2 is replaced 
by lim n ^ 00 \\C{y/nZ n ) — G n \\ = cp(x,y). A further analysis shows that the 
inequality in the theorem is actually strict. 

Why are the maximum orders of p n and q n equal to o{n 1 / 2 ) as shown in 
Theorems 1 and 2? 

There are two reasons. First, Diaconis and Freedman [11] showed that 
the variation distance between the distribution of the o{n) entries of the 
first column of r„ and that of independent normals goes to zero. We know 
that Z n , a p n by q n sub-matrix of T n , has p n q n elements. One can guess 
that the number of approximated entries are fixed (loosely speaking). So 
the largest a in p n = o(n a ) and q n = o(n a ) must be 1/2. Second, we can see 
this mathematically. Let f n (z) and g n (z) be the density functions of y/nZ n 
and G n , respectively. By (1.2), 

(1.3) \\£(V^Z n )-G n \\= J 

where the integration region in the first integral is MP nqn , and the p n q n entries 
of the matrix X n are independent standard normals. The term f(X n ) /g(X n ), 
as will be shown later, converges weakly to a lognormal distribution when 
both p n and q n are of order n 1 / 2 ; f(X n )/g(X n ) converges to one when both 
p n and q n are of order o(n 1//2 ). 

Now we consider the approximation method as in Theorem A. 2. 

Let Y n = (yij)i<i,j<n, where yi^s are independent standard normals. Let 
also T n = (ji^KijKn be the orthogonal matrix obtained from performing 
the Gram-Schmidt procedure on the columns of Y n (the procedure is briefly 
reviewed at the beginning of Section 3). Define 

e n (m) = max | y/njij - y« | . 

l<i<n,l<j<m 

We have the following theorem. 

Theorem 3. Let {m n <n; n> 1} be a sequence of positive integers. 
Then: 



9n(z) 



1 



g n {z) dz = E 



jn\X n ) 
<7n(A n ) 
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(i) the matrix T n is Haar invariant on the orthogonal group 0(n); 

(ii) £ n ( m n) — > in probability, provided m n = o(n/logn) as n — > oo; 

(iii) for any a > 0, we have that e n ( [na/ log n] ) — > in probability as 
n — ► oo . 



This theorem tells us that the maximum order of m n such that e n {m n ) — > 
in probability is that m n = o(n/ log n), where the typical orthogonal ma- 
trix Y n is obtained through performing the Gram-Schmidt procedure for a 
matrix whose elements are independent standard normals. 

We prove Theorems 1 and 2 in Section 2. Theorem 3 is proved in Section 
3. Technical lemmas used in Sections 2 and 3 are given in Section 4. At last, 
a couple of known results needed for the proof of Theorem 3 are listed in 
the Appendix. 



2. The proofs of Theorems 1 and 2. First we list some lemmas needed 
for the proofs of Theorems 1 and 2. The proofs of these lemmas are listed 
in Section 4.1. 



Lemma 2.1. LetT(x), x > be the standard Gamma function. Then: 

(,) 1 1<E(!L±£Z2) <1 forall n>l; 
on yni (n) 

r((n + l)/2) 



(ii) 



vV2r(n/2) 



1 



< 



5n 



for all n > 1. 



Lemma 2.2. Lei f(u,v) be a real-valued function. Suppose the three 
second-order derivatives of f exist, bounded below and above by —M and 
M, respectively, over [a,b] x [c,d\. Then 



1 n 12 fi i 

^E E/f 1 *- 



3=3i s=»i 



ii/ra 



J2 «2 



dx dy 



3=3i »=*1 



J 2 «2 



1 I 

n ' n 



2n 3 E E M „> „ ) + e ' 



n' n 



where \e\ < (12 — h)(j2 — ji)M/n 4 for any i\,ii-,j\ and 32 such that na < 
i\ < %2 < nb — 1 and nc < j\ < 32 <nd — 1. 
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We will use the following setting a couple of times. 

Let X = (xij) be a p by q matrix, where {xjj, 1 < i < p; 1 < j < </} are 
(2-1) 

i.i.d. standard normals. Let Ai, A2, • ■ • , A 9 be the eigenvalues of X X. 

A sequence {X n ; n > 1} will be studied, where X n is of the above setting for 
each n. We still use notation X for X n sometimes when there is no confusion. 

The next lemma is a standard result when using the moment method to 
show weak convergence of certain functions of eigenvalues of matrices with 
independent and identically distributed random variables as entries. It is 
can be seen from, for example, (2.15) and (2.16) in [3]. 

Lemma 2.3. Let {p n ;n > 1} and {q n ;n > 1} be two sequences of positive 
integers such that p n — > 00 and p n /q n — > ?7 £ (0, 00). For each n, assume the 
setting in (2.1) with p = p n a nd q = q n . The following two statements hold. 
For each integer k > 1, 



(i) £(tr(X;X n )V^n£ 



1 fqnY (k\ (k-l 



_ r+l\p n J \rj V r 



as n — > 00 . 



t?{{x> n x n ) k ) ^y-^ (k\ (k-i 

W q k n +1 ijr+lWl r 

in probability as n — > 00. 

Lemma 2.4. Lei e G (0,1). Let {p n ;n > 1} and {q n ',n > 1} 6e two se- 
quences of positive integers such that e < p n /q n < e" 1 /or aZZ n > 1. For 
each n, assume the setting in (2.1) w;ii/i p = p n and q = q n . Assume p n — > 00 
as n — > 00 . Then: 

(i) Var(tr((X;X n ) 2 )) ~ p2 g 2 + 8 Pn q n ( Pn + qn f a sn^ 00; 

(ii) Cov(tr(A^X n ),tr((X^X n ) 2 )) ~ ^p n q n (Pn + Qn) as n -> 00. 

The following lemma is Proposition 2.1 from [14] or Proposition 7.3 from 
[15]. This is the starting point of the proofs of Theorems 1 and 2. 



LEMMA 2.5. Let U be an n by n random matrix which is uniformly 
distributed on the orthogonal group O n and let Z be the upper left p x q 
corner block of U . If p + q<n and q < p, then the joint density function of 
entries of Z is 

(2.2) f(z) = (V2^r Pq Uj{n ~ P \ q) {det(I q - z'z)^^/ 2 }I (z'z), 
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where Iq{z' z) is the indicator function of the set that all q eigenvalues of z'z 
are in (0,1), and uj(-,-) is the Wishart constant defined by 

1 = 7r 5(5-i)/4 2 r«/2 rr T ( r ~j + 1 

f = \ \ 2 

Here s is a positive integer and r is a real number, r > s — 1. When p < q, 
the density of Z is obtained by interchanging p and q in the above Wishart 
constant. 

To simplify notation, when there is no confusion, we write p for p n and q 
for q n . 

Let g(z) be the joint density function of entries of X = (xij) pxq , where 
independent standard normals. So, g{z) = (2vr) pq l 2 exp(— tr(z'z) /2), 
where z is a p by q matrix. We need to understand the ratio f(z)/g{z) in 
later proofs. Assuming the pq entries of z are independent standard normals, 
then f(z) I 'g(z) can be written as a product of a constant part and a random 
part. They are analyzed in the following two lemmas. 

Lemma 2.6. Given x > and y > 0, let p = p n = [in 1 ' 2 ] and q = q n = 
[yn 1 ' 2 }. Set 



Kn 

Then 



2W 2 « r((n-j + l)/2) 



n n 



=i r((n-p-i + l)/2)' 



f (p z q+pq z xy 2x 6 y + 2xy 6 + 3x z y z \ 
(2.3) K n = e X p|-(^^— + T + j+ (l) 

as n is sufficiently large. 

Proof. Suppose p = 2/c. Using the fact that T(x + 1) = xT(x), we have 
that 



(2.4) 



where 



mi: - ' 



j=Xi=l 



n 



e Bn , 



2i + j 



i=0i=l 



T. JIANG 



Let f(s,t) =log(l-2s-t) with 2s + t < 1. Then f' s {s,t) = -2/(1 - 2s -t) 
-2 + 0(n~ 1 / 2 ), / t / (s,t) = -l/(l-2s-f) = -l + 0(n- 1 / 2 ) and 



9 2 / 



ds 2 



and 



(l-2s-t) 2 
5 2 / 



<5, 



d 2 f 



dt 2 



-l 



(l-2s-t) s 



<5 



9s at 



(l-2s-t) 2 



< 5 



for all (s,t) £ [0,p/n] x [0,q/n], as n is sufficiently large. By Lemma 2.2, 



B n = n z 



q/n f(k+l)/n 



3kq 



(2.5) 



Jl/n 
v ru 



log(l - 2s - t) ds dt + — - + 0[ —= 



2n 



1 



— / / log(l + s + 1) ds dt 
2 Jo Jo 



rr 



2 r v r-2/n 



3xy 



2 Jo Jo 



log(l + s + 1) dsdt + — - + O [-= ), 



as n is sufficiently large, where u = — (p + 2)/n, v = —q/n. We now estimate 
the above integral. By Taylor's expansion, there exists 5 > such that 



log(l + s + i) - i^(s + t) 
for all s and t such that s + t £ (0,5). Thus, 
log(l + s + t) ds dt 

(2.6) 



(s + tf 



<(s + tf 



nu rv rii 

(s + t)dsdt-\j I (s + tfdsdt 
Jo Jo 

(s + tfdsdt, 



as both u and v are in (0,6/2). It is trivial to verify that 



(s + t) k dsdt 



1 



(fc + l)(fc + 2) 
for k > 0. Plugging this into (2.6), we obtain 



^ u + v f+2_ u k+2_ v k+2 ) 



JO 



log(l + s + t)dsdt: 



U 2 V + IfU 2 1 



(2TO a + 2u 6 v + 3itV' 



2 12 

+ 0((u + vf), 
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as n — > 00. [The actual formula for the integral is 
log(l + s + t) dsdt 

= ±(l + u + v) 2 log{l + u + v) 

- i(l + u) 2 log(l + u) - i(l + «) 2 log(l + u) - fm;.] 

Now substituting u = — (p + 2)/n and u = —q/n back into the two integrals 
in (2.5), we have that 

log(l + s + t)dsdt 



n 2 



2 

(2.7) 

1 



p 2 q + pg 2 y 2 2xy 3 + 2x 3 y + 3x 2 y 2 



+ xy + — + 



4n " 2 24 

and 

n 2 p V p-2/n y 2 / 1 

(2.8) — / / log(l + s + t)dsdt = -^ + 0[-= 



+ 0[ — 

n 



'o Jo 

as n is sufficiently large. Combining (2.4), (2.5), (2.7) and (2.8), we obtain 

/ om v f f P 2 Q + PQ 2 , , 2x 3 y + 2xy 3 + 3xV ^ _ 1/2 
(2.9) if ra = exp|-(^ ^ + T + j+0(n /) 

as n is sufficiently large. 

Now, suppose p = 2k — 1. Let 

Cn = -Q r((n-j-p+l)/2) 



^ r((n - j - p)/2) ^(n-j-p)/2 ' 

By Lemma 2.1, the jth term in the product, say, C n j, has the following 
property: 

1 " < C n .j < 1 + 

n — p — q n — p — q 

for all j = 1, 2, ... ,q as long as p + q < n — 3. Therefore, 

V < C n <(l + 

n — p — qj \ n — p 

Since (1 + x n ) kn = 1 + 0(k n x n ) as x n — > 0, A; n — > 00 and fc n x n — ► 0. It follows 
that C n = l + 0(n~ 1 / 2 ), provided p = 0( v / n) and q = 0(y/n). So 



C n \nj ^ r((n - j - 2k + l)/2)V(n -j-2fc + l)/2 

nn^^} ■ {n ^^r 1/2 -< ■ 
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where the fact T(x + 1) = xT(x) is used in the second step. Now 

1 ' / ?' + 2k - 1 

iog^=-2giog(i-^— 
(2 .io) = _Lg 0+2)! _ 1 ) +o (_L 

_ 3/ 2 + 2xy | ^ 1 



as n — > co. In notation, K' n is identical to K n in (2.4). Keep in mind that 
the k in (2.4) is equal to p/2; but the /c in the definition of K' n is equal to 
(p + l)/2. Apply (2.9) to to obtain 

, (p+l) 2 g+(p+l)g 2 , ^ , 2x 3 y + 2xy 3 + 3x 2 y 2 _ 1/2 

-logA = h —r H — hO(n 7 ) 

n 4n 4 24 v 7 

p 2 q+pq 2 3xy + y 2 2x 3 y + 2xy 3 + 3x 2 y 2 _ 1/2 

= 3 1 3 1 ^ hO n 7 ). 

4n 4 24 

This together with (2.10) thus yields (2.3). □ 

Lemma 2.7. Suppose x > and y > 0. For eac/i n > 1, assume the set- 
ting in (2.1) with p = p n = \xy/n\ and q = q n = \yy/n\ ■ Define 

( q , \.\ \ ("-P-9- 1 )/ 2 /, 1 \ 

L ^=|n( 1 -^j| exp^-5^A i J/(0<Ai,A 2 ,... ) A,<n). 

Then, e~ an L n converges weakly to the distribution of e a ^ , where £ is a stan- 
dard normal, and 

p 2 q + pq 2 3xy + x 3 y + xy 3 xy 
an = ^n— + 12 a = T- 



Proof. Set 



(2.11) f( x ) = {2^ 2 



x n — p — q — 1, 

+ — log(l--], rfO<x<n, 

n, 



-oo, otherwise. 

Then, L n = ex.pQ2i=i f(^i))- F° r an Y x £ (0, n), by Taylor's expansion, there 
exists £ = £a; E (0, x) such that 

/_, x\ x x 2 x 3 x 4 1 

log 1 =1 



n / n 2ra 2 3n 3 4 (£ — n 
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Then 

s p + q + l n-p-q-l 2 
I\ x ) = o x T~2 x 

(2.12) 2n An 

n — p — q — lo , \ xA 
6n3 X +9n ^ X '^' x€(0,n), 

where g n (x) = — n 3 (n — p — q — l)/(8(£ — n) 4 ). It is trivial to see that 
su Po<x<an I<7n(a0l < (1 — a) -4 for any a G (0, 1). Recall that Ai, A2, . . . , X q are 
eigenvalues of X' n X n , where the entries of the px q matrix X n are indepen- 
dent standard normals. Note that p ~ Xy/n and q ~ yy/n. By the Theorem 
from [17] or Theorem 3.1 from [31], there exists a constant c(x,y) G (0,oo) 
such that 

(2.13) >c(x,y) 

\/n 



in probability as n — > 00. Define O n := {maxi<i<,jAj < (c(x,y) + l) v / n}. 
Then 

(2.14) 

as n — > 00. Now on f2 n , by (2.12), 

± m = E+jLti t r(M , - !Lz£^zI tr((^f > 

(2.15) 

where \g n \ G [0,2), as n is sufficiently large. Note that tr((X'X) 1 ) are well- 
defined random variables which do not depend on Q n . Easily, E(tr(X'X)) = 
pq. By Lemma 2.3, 

Etr((X'X) 3 )~pq(p 2 + q 2 + 3 P q) and E tr((X' X) 4 ) < C{x,y)q 5 

for some constant C(x,y). It is easy to check that 

tr((X'X) 2 ) = ££ 4 + £ E 44 
j=li=l i=li=#=l 

P Q 

~H ^ ] ^ ] x ij x ik~^~ ^ ] XijXifcXikXij . 

Then -Etr((X'X) 2 ) = pq(p + q + 1) [this is sharper than the one corre- 
sponding to the case = 2 in (i) of Lemma 2.3]. Now set /ij = tT^X'X) 1 — 
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E(ti(X' X) 1 ) for i = 1,2,3. By simple algebra, we have from (2.15) that 

V" fA \ _ P 2 <1+P<1 2 , 3xy + x 3 y + xy 3 ( 1 
V (Al) " 4n + 12 



p + q+l n-p-q-1 n-p-q-1 h 4 

H o "1 T~5 ft2 c - ^ ft 3 + 

on O n as n — > oo. Recall that L„ = exp(^? =1 /(Aj)). By (ii) of Lemma 2.3, 
both hz/n 2 and /14/n 3 go to zero in probability. By (2.14), to prove the 
lemma, it suffices to show that 

(2.16) Wni= —— hl _ h2 

converges to iV(0,cr 2 ) weakly, 

where a is as in the statement of the lemma. Since tr(X'X) = J2i,j x iji 
which is a sum of independent and identically distributed random variables, 
Var(/ii) = Var(tr(X'X)) = 2pq. Therefore, by Lemma 2.4, Var(/i2)/n 2 con- 
verges to a positive constant. By Theorem 4.1 from [21], (h\/ ^/Var(hi), 
/i2/\/Var(/i2) ) converges weakly to a normal distribution with mean zero. It 
follows that W n converges weakly to a normal distribution with mean zero. 
We only need to calculate variance a 2 . Now, 

Var(W n ) = (p+ ^ 1) \ & T(tv(X'X)) + ^-l^^l Var(tr((X'X) 2 )) 

- (P + <l + l)(n-p- q -l) • Cov ( tr(X'X), tr((X'X) 2 )). 

Since Var(tr(X'X)) = 2pq as calculated earlier, by Lemma 2.4 again, the 
above yields 

x 2 v 2 

Var(W n ) - -f-, 
lb 

as n — > 00. Therefore, a 2 = lim n ^ 00 Var(W n ) = x 2 y 2 /16. The proof is com- 
pleted. □ 

COROLLARY 2.1. For x>0 and y > 0, Ze£ p n = [xn 1 / 2 ] and q n = [yn 1 / 2 ]. 
Let f n (z) be the joint probability density function of Z n as in Theorem 1 and 
g n (z) be the joint probability density function of p n q n independent standard 
normals. Then as n — > 00, 

fn(X n ) „ / x 2 y 2 xy \ 
converges weakly to exp 1 £ I, 



9n{X n ) L \ 8 4 

where £ and aZZ i/ie entries of X n are independent standard normals. 



NORMALS APPROXIMATE MATRIX ENTRIES 



13 



Proof. Without loss of generality, we assume y < x. Hence, q n < p n for 
any n>l. By Lemma 2.5, the density function of \fnZ n is 

u{n,q) IV n J J 



Obviously, g n (z) := (V2tt) pq e tv ( z ' z )/ 2 . Let Ai, A 2 , . . . , X q be the eigenvalues 

fn(X r , 



ofX' n X n . Then 



where 



(2,7) K^(lY q/2 f[ r«»-i + D/2) 



J f^T((n-p- j + l)/2)' 

( q , x A 1 (n-p- ? -l)/2 / g v 

(2.i8) ^={n(i-^)| exp uS A v 

if all Aj's are in (0, n), and L n is zero otherwise. The desired conclusion 
immediately follows from Lemmas 2.6 and 2.7 on K n and L n , respectively. 

□ 



Proof of Theorem 2. First, we show that the lower bound is strictly 
between zero and one. Recall <p(x,y) = E\ exp(— (x 2 y 2 /8) + (zyf/4)) - 1|. 
Then <f){x,y) > because £ is a nondegenerate random variable. Second, by 
Holder's inequality, 



(x,y)<h 



exp 



2 2 

x y xy 



l 



2-, 1/2 



By expanding the square and using the fact that i?exp(t£) = exp(i 2 /2) for 
any t £ R, we have that 



(x,y) 2 <e- x y 



2e 



-3x 2 y 2 /32 



+ 1. 



Let ip(t) = e"*/ 8 - 2e" 3t / 32 + 1 for t £ R. Then p(0) = 0, <p(+oo) = 1 and 
<p>(t) = (3/16)e" i / 8 (e i / 32 - (2/3)) > for any t > 0. Thus, <f>{x t y) < 1 for any 
x > and y > 0. 

Now we prove the remaining part of Theorem 2. 

Let us continue to use the notation in Corollary 2.1. First, 



(2.19) d(£(V^Z),G n ) 



Rpq 



fn(z) 
9n(z) 



1 



g n (z) dz = E 



fn{X n ) 
9n{X n ) 



1 



14 



T. JIANG 



where X n has the density function g n (z), that is, the pq entries of X n are 
independent standard normals. Second, by Corollary 2.1, 

f n (X n ) , . ( x 2 y 2 xy 
converges weakly to exp h — £ 



9n{X n ) ° "V 8 4 

where £ is a standard normal. Then, applying Fatou's lemma to (2.19), 



liminf d(C(y/nZ),G n ) >E 

n— >oo 

The proof is completed. □ 



2 2 



Proof of Theorem 1. Let p' n = q' n =p n + q n + [ra 1 / 4 ]. For an n by n 
random orthogonal matrix [/ which has the normalized Haar measure, let 
Z P:Q denote the upper left p by q block of U, 1 < p, q <n. Thus, Z PniQn is a 
sub-block of Z p i nA i n . As a consequence, the joint density function of entries of 
Z p q is a marginal density function of that of Z p i q i . Therefore, by formula 
(1-2), 

(2.20) \\C(VEZ Pntqn ) - G Mn \\ < \\£(vKZ K>q , n ) - G pW J, 

where G pq is the joint distribution of pq standard normal distributions [one 
can verify this by choosing B = Ax W'™ q '™~ Pnqn for any Borel set A G W" qn 
and then plugging them into definition (1.2)]. 

So, to prove the theorem, without loss of generality, we assume p n = q n 
for all n > 1, p n — > oo and p n = o(y / n). 

As in the proof of Theorem 2, 

\\£{VnZ Pn ,q n ) ~ G Pn q n \\ = E\K n ■ L n - 1|, 

where K n and L n are as in (2.17) and (2.18). By following the proof of 
Lemma 2.6 step by step, we obtain that 

p 2 q + pq 2 



(2.21) if w = expj- " ^ +o(l) 
as n — > oo. We claim that 

(2.22) e -(P 2 g+Pi 2 )/*n Ln ^ ^ 

in probability as n — > oo. If this is true, then K n ■ L n — > 1 in probability 
as n — > oo. Note that K n ■ L n > and it is easy to see that E(K n ■ L n ) = 
Irpi fn{x)dx = 1. These three facts imply that {K n ■ L n } is uniformly inte- 
grable, that is, limsup^ +00 limsup^^ E(K n L n I {KnLn > t }) = 0. It follows 
that E\K n L n — 1| — > as n — > oo. The proof is then complete. 

Now we prove claim (2.22). Let us go back to the proof of Lemma 2.7. 
Since p n = q n = o(- v /n), the term c(x,y) in (2.13) is equal to zero. So, cor- 
respondingly, Q n = {maxi<j<q Aj < yjn} and -P(O^) — > as n — > oo. Recall 
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the definition of f(x) in (2.11) and L n = exp(^| =1 /(Aj)). On S7 n , similar to 
(2.15), 

P + 1 + l ±tv'v\ n-p-q-1 (lv , v ^ 
^ /(Al) = 2n ( } 4^2 tT ^ X X ^ ) 



i=l 



(2.23) 



tr((X'X) 3 ) 



71- 

p 2 q + pc/ 2 ^ p + c? + 1 ^ n-p-q-1 ^ 
An 2n An 2 

tvgx'xf) 

i 9n 9 j 



where g n is a random variable satisfying \g n \ G [0, 2), as n is sufficiently large, 
and /ij = tr(X'X)* — E(ti(X'Xy). Obviously, hi is well defined on the same 
probability space as those of Xjj's which do not depend on Q n . Note that 
p = q = o(y / n). Then 

(2.24) £ fel = ^.^£Li(4- 1 )_ 0| 

n n p 

in probability as n — > oo by the classical central limit theorem of independent 
and identically distributed random variables. We will show next that the 
third term on the right-hand side of (2.23) also goes to zero in probability. 
Indeed, 

h 2 \ \ ^ Var(tr((X'X) 2 )) _ Q / ( M ) 2 + 8pg(p + q) 2 ' 



n / n 2 e 2 \ n 2 

by (i) of Lemma 2.4. This says that 

(2.25, ™^.,^„, 

in probability as n — > oo. Last, 

tr((X'X) 3 ) _ p 4 tr((X'X) 3 ) - E(tr((X'X) 3 )) £(tr((X'X) 3 )) 
n 2 ra 2 p 4 re 2 

By (ii) of Lemma 2.3, the first term on the right-hand side goes to zero in 
probability. By (i) of Lemma 2.3, Eti((X' X) s ) ~ pq(p 2 + q 2 +3pq) as n — > oo. 
So the second term on the right-hand side goes to zero. Consequently, 

(2.26) - ; ; ->0 

n z 

in probability. Combining (2.23)-(2.26), we obtain 



16 T. JIANG 

in probability, which, together with the fact that P(Q^) — ► 0, implies (2.22). 

□ 

3. The proof of Theorem 3. The main tool of proving Theorem 3 is the 
Gram-Schmidt algorithm. Let us briefly review it first. 

Suppose {yi,y2, . . . ,y n } is a sequence of n x 1 vectors. Set wi = yi and 



(3.1) wj = yj - ijrrp w i> •? = 2 > 3 



7—1 T 

, o, . . . , n, 

i=l 11 ' 



where ||wj|| 2 = wjwj (J = 1, 2, . . . , n). Then, {wj,l < j < n} are orthogo- 
nal, that is, wfwj = for any 1 < i < j < n. Let 7- = (l/||wj ||)wj, j = 
l,2,...,n. Then the matrix T n = (7^ 7 2 , . . . , 7 n ) is orthonormal. So (3.1) 
can be rewritten as follows: 

i-i 

(3.2) w i = y i -^(yj 7i )7 i , j = 2,3,...,n. 

i=l 

The reader is referred to Section A. 5 on page 603 from [1] and page 15 from 
[18] for further details. 
Define 

Ai = 0, A J =J2(yjj l hi and 
(3-3) , ^ 



n 



|Wj|| 2 



j = l,2,...,re. 



Note yj7i 6 and rewrite (yj7j)7j = {lilj)yj- It is easy to check that 
Wj = (l n - T n j T^ .-)yj, A j = T n j T^ j y j and 

(3-4) 

_ yj Aj i 

7 x/n x/n 



where T n>j = (71, 7 2 , • • • ,7j-i) and Uj = (1 - n 1/2 ||wj||)7j. 

One repeatedly used fact in later proofs is that if the n 2 elements of Y = 

(y 1 j y2 > • • • > y?i 

) are i.i.d. standard normals, then T n = (7i,7 2j - • ■ ,7n) follows 
the normalized Haar measure on the orthogonal group O(re). In particular, 
7j's are identically distributed and 



(3.5) C( li )=C 
for any i = 1, 2, . . . , re. 



yi 
lyil 
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For any nxn orthogonal matrix G, observe that C(GT~ 1 ) = £((T n G T )~ 1 ) 
^(r" 1 ) by the invariance property of Haar measures. Also, T" 1 = T T . From 
the uniqueness of Haar measures, we obtain another useful fact that 

(3.6) C{T n )=C{T T n ). 

We will use the following notation. Let A = be a p by q matrix. Then 
(3-7) P||| := max |ay|. 

l<«<p,l<j<g 

The following definition will also be used: 

e n (m)= max |V"7m — 2/m'I and 

l<i<n,l<i<m 

(3-8) 

n 

a [ log n — (5/4) log (log n) 

for a > and n>2. 

The following says that, to prove part (hi) of Theorem 3, we only need to 
work on max2<j< m |||Aj|||. 

Lemma 3.1. Let e n {m) and n a be as in (3.8). Then 



P 

as n — > oo for any a > and 5 > . 



£n( n a) — max (I) A J 

2<j<n a 



>6 



The following lemma is the key in the proof of Theorem 3. A recursive 
inequality is derived. It implies that all A,'s are almost independent when 
j < n a . 

Lemma 3.2. Let £ be a standard normal. Given a > and t > 0, define 

fn (logn) 8- 



/+(A0=p(|£|>t(^| + 



A; = 1,2, 



and fn(k) as the probability above when "+" on the right-hand side is re- 
placed by "— ." Then there exists a constant C = C a> t > such that 
P(max2<j<fc+i I Ajlll <t) is bounded below and above, respectively, by 

( 1 -„ / - W )p( 2? gJ|A J -||< t )-i^ 

and 

(logn)*- 7 



;i-n/+(/c))P(^maxJ||A J |||<^ 



(t' 2 /a)-2 ' 



II 



uniformly on n/(logn) 3 < k < n a as n is sufficiently large, where n a is as 
in (3.8). 
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Proof of Theorem 3. Part (i) is obvious. As for (ii), take r = 1/logra, 
s = (logn) 3 / 4 , t = t, m = m' n = [Sn/logn] for some 5 < min{l/4,i 2 /100} in 
Lemma A. 4. Trivially, i 2 /(3(m + y/n)) > i 2 (logn)/(6n<5) and 1/s < 1, as n 
is sufficiently large. We obtain that 

P(£ n (m' n )>3t) 

< 4ne -n/(41ogn) 2 + 3n2e -(logn)3/ 2 /2 + ^ 

t \ 65 n J 

-►0, 

as n — ► oo by the choice of 5. 

Now we prove (iii). To simplify notation, set m = n a . We actually will 
show that 

ri, ift>2ys, 

(3.9) p( max ||| Aj ||| < t J < e~ K{l , if t = 2^5, 

where K = (S-y/^)" 1 . Since P(max2<j< m ||| Aj||| < t) is increasing in t, the 
above implies that the left-hand side above goes to zero for any t £ (0, 2y/a). 
This together with (3.9) implies that max2<j< m |||Aj||| converges to 2^/a in 
probability. Lemma 3.1 says that e n (n a ) — ma,X2<j< na III A j III converges to 
zero in probability as n —> oo. It follows that 

(3.10) )^2 v / a, 

in probability as n — > oo. We next show that this implies that e n ( [na/ log n] ) — 
2y/a as n — ► oo. Indeed, set k a = [na/ logn]. For any 5 £ (0, \/a), choose a\ 
such that 

~ 4 / 01 a ' 

Then n Q1 < k a < n a , as n is sufficiently large. It follows from the definition of 
e n (m) that £ n (n Ql ) < e n (k a ) < e n (n a ), as n is sufficiently large. Therefore, 

P(\e n (k a ) - 2y^| > 5) < P(e n (fc a ) >2^+5) + P{e n (k a ) <2^-5) 

5 s 



< P(e n (n a ) > 2yJ~a~ + 5) + P \e n (n ai ) < 2ja~ x 

as n is sufficiently large. The above two terms go to zero as n — > oo by (3.10). 
Then (iii) follows. 
Now we show (3.9). 

We continue to use the notation in Lemma 3.2. Set 

A k = P (max (I) A, ||| < t) , b+ = 1 - nf+ (k) , b^ = l-nf~(k), 



logn) c , 
,, 2/ \ „ and m 



n 



(logn) 



+ 2. 
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By Lemma A.l, P(|£| > x) ~ (2/(\/2irx)) exp(— x 2 /2) as x — > +00 for a stan- 
dard normal £. Here and later, the notation "/(x) ~ 5(2;) as x — ► +00" means 
that lim^+oo f(x)/g(x) = 1. The same interpretation applies to a n ~ /3 n as 
n — > 00. It is easy to check that 



2 fkV/ 2 



(t 2 /2)(n/fc) 



(3.11) both/+(fc) and/-(fc)- 

W2-K \n, 

uniformly onm'<Kmasn->oo, and also that 

(3.12) 1 > max{6+, &r ; m ' < i < m} -> 1 
as n — > 00, provided t > \/2a- By Lemma 3.2, 

&^ Afc-i - c n <A k < b%A k -i + c n 
for all m' <k < m, as n is sufficiently large. By iteration, we obtain 



(m \ m— m'+2 

n&7jAn'-l-C n £ 
j=m' / J=0 



™ {&< } 

m'<«<m 



By (3.12), the second term on the right-hand side is no larger than nc n < 
(\ogn) c /n^ 1 / Q )~ 3 , as n is sufficiently large. Further, applying the same ar- 
gument in (3.13) to the "+" case, we obtain 

, 3 ,4) ( fcjW-, - « < Am < ( fl + « 

\j=m' / \j=m' / 

as n is sufficiently large. By definition, A k = P(max2<j<k ||| Aj ||| < i). From 
the proved (ii), we know that A m >_i — > 1 as n — > 00 for any i > 0. Evidently, 
(logn^n 3- ^ l a > — > 0, provided t > \/3a. So to prove (3.9), we only need to 
show that 

m m ( 1, if t > 2-y/a, 

(3.15) both [| bj and [] 6+ I e ~ Kt \ if t = 2y/E, 

j= m > j= m > [0, if te (V3a,2^/a), 

as n — > 00. Recall &^ = 1 — nf£(k) and 6 J = 1 — nf~(k). Since | log(l + x) — 
x| < x 2 for x small enough, by (3.11) and (3.12), 

m / m \ / m \ 

IJ 6/<exp -n £ /n + W -exp +n 2 £ /+(*)* ), 

j=m' \ k=m' / \ k=m' / 

m / m \ / m \ 

II &->exp -n ^ /-(fe) -exp -n 2 £ /"(fc) 3 ), 

j=m' \ k=m' / \ k=m' / 
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as n is sufficiently large. Also, the fact fn(k) < f n (k) implies that b~j~ > b- 
So (3.15) is reduced to show that 



n 2 E fn(kf^0 and 

k=m' 



(3.16) 

m ( 0, if t > 2y/a, 

n E fn(k)^\Kt 2 , ift = 2VS, 

fe=m' l+oo, if * G (V^a^Va), 

and that the above is also true if fn(k) is replaced by fn(k). 

By (3.11) again, n 2 J2T=m> fn( k ) 2 < (logn) c n 3 ~( t2 / a ) -^Oasn^oo, pro- 
vided t > V3a. Similarly, n 2 YX =m i fn( k ) 2 ^ for t > y/3a. Let 



2 



g(x) = — i=x 1 e 



for x > 0. By the uniform convergence of f^(k)/g(k/n) and f~{k)/g(k/n) 
as n — > oo over fe G [m',m] as in (3.11), to prove the second part in (3.16), 
it is enough to show 

(3.17) n E 9\ ~ ) § oes to the second limit in (3.16) 

k=m' 

as n — ► oo. Note that g(x) is nonnegative and increasing in x over [0,+oo), 
it is elementary to see that 



1 ™ /fcN i-m/n 



n , — ', \n 



r(m+l)/n rm' /n 

< / g(x) dx + g{x) dx. 

Jm/n JO 



Using y/xe * 2 /( 2x ) < e * 2 /( 2x ) on x G [0,1], the first integral on the right- 
hand side is bounded by (1/ra) exp(— nt 2 /(2m + 2)) < n" 1 "^ 2 ^ 2 "^ (logn) c , 
as n is sufficiently large; the second one is bounded by exp(— (logra) 2 ) as n 
is large because m! ~ n(logn) -3 by definition. Hence, 

1 m ( k \ rm/n / y 

(3.18) -E5 (-)-/ g(x)dx = o(-2 



n — . \nj Jo \n 

k=m' 



as n — ► oo if t > \/2a. Now we evaluate the integral. 

Write y / xexp(— t 2 /(2x)) dx = (2t~ 2 x 5 / 2 )d(e~ t /( 2x )). By integration by parts, 

U := f " VSe-"/<-) * = 4 W 5/2 e-»"/« _ » /""" ^V<V<2*> ix . 
Jo t z \n J t z Jo 



Note that vx 3 < {m/n)y/x on [0,m/n]. The last integral is less than or equal 
to {m/n)I n . But m/n— > 0, thus, 
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By the definition of m, nt 2 /(2m) = (i 2 /(2a))(log n — (5/4)log 2 n) + 
0(n _1 (logn) 2 ) as n— > oo. It follows that 

^~^-^t(^") MV<8 " | - 5/2 - 

Prom (3.18), 

'k\ o f m/n 



( k\ rm/n 

?, 5 Uh n 7o 9{x)dx 



2n 2 1 n „ n) ^/(Say 5 /2 



k=m' 



provided t > \/3a. Recall K = (8\/27r) _1 . The above implies (3.17). □ 

4. Technical lemmas. Now we prove the lemmas used in the previous 
sections. To see them clearly, we break them into two subsections. 

4.1. The proofs of lemmas used in Section 2. 

PROOF of Lemma 2.1. (i) First, when n= 1, T(n + {l/2))/{sfnT(n)) = 
ypn : /2 G (5/6, 1). So (i) is true for n = 1. Now assume n > 2. 

Using the fact that T(x + 1) = xT(x) for any x > and T(l/2) = ^/tt, we 
have that 

r(ra + (l/2)) _ V^n {2n)\ 
Tin] ~ 2 2n ' (nl) 2 ' 



By Stirling's formula (see, e.g., Lemma 1 on page 45 from [6]), nl = \[2/jmn r 
e -n+e n /(i2n) for all n > 2 , where 

(4-1) _^_ <0n< i. 

n + 1/12 

It is easily checked that 

r(n+(i/2) )=exp ^„-< 



Vnr(n) K V 24n 

for soin6 n corresponding to 2?! and f n corresponding to ft in (4.1). Evi- 
dently, (0 n - 46>;)/24 E (— 1 /6, 0) for all n > 2. Then the desired result follows 
by using the inequality e x > 1 + x for all x ^0. 

(ii) A direct verification shows that (ii) is true for n = 1. Now assume 
n > 2. If n = 2k for some integer k > 1, then (ii) follows from (i). Now 
suppose n = 2k + 1 for k > 1. Trivially, 



r((n + l)/2) /T(fc + (l/2))\- 2fc 



v/^72r(n/2) V >/Sfcr(fc) / V 2fe + 1 
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By (i), the above ratio is between \]2kj(2k + 1) and (1 — (6k) 1 ) . By 
a simple calculation, y/2k/(2k + 1) > 1 - (3/5n) and (1 - (6k)' 1 )' 1 < 1 + 
(5/c)" 1 for all k > 1. So (ii) follows. □ 

Proof of Lemma 2.2. By the multivariate Taylor's expansion formula 
(see page 361 from [2] and page 172 from [22]), 

/(*, V) = / (V W, ") 4 f 1 > ") (»--)+ 

\n n) \n nj \ n) y \n nj \ nj 



for some £ S [«/ n ;x] and G \j/n,y], where 
(4.3) 



8u(x,v) = 7;((x-~) ^2 



n 



i 



j\ d 2 f , / j\ 2 9 2 / 



\ n / \ n ) dx dy \ nj dy" 2 
By the given condition, 

i %(I , 9) i < i) + ( y _ i)Y < M ((, - A) 2 + („ - r 2 



Then 



(j'+i)/n /•(*+!)/" 

/ f(x,y)dxdy 

j/n Ji/n 



n n 



where 



r(i+l)/n rl/n rl/n 2M 

/ Safari) dx dy <M / (x 2 + y 2 ) dxdy = —r 

Ji/n Jo Jo 3n 4 



(j+l)/n r(i+l)/n 
j/n Ji/n 



since |<%(£,??)| < M((x - i/n) 2 + (y - j/n) 2 ) by (4.3). The desired result 
follows by taking the sum over i from i\ to Z2, and j from j\ to j2- □ 

Proof of Lemma 2.4. (i) It is not difficult to check that 

tr(x'x) = f:E4; 
J=li=l 

(4.4) tr((X'X) 2 ) = ££4 + £ £ 44 

j=ii=i j=i»^=i 

~l~ ^ ] ^ ] ■^ij x ik~ >r ^ ] %ij x ik x lk x lj ■ 
i=lj^k=l i^l,jj^k 
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Let 



q p 



5 i = EE(4- 3 )> 
j=n=i 



^ = E E 

j=ii^=i 



B 3 = E E ~ 1 )( X ifc ~ 1 )' Bi= E x ijXikXlkXlj. 

i=l j^k=l i^l,j^k 

By a simple algebra, 

(4.5) tr((X'X) 2 ) = (j2 B^j + 2{p + q-2) tr(X'X) + C p>q , 

where C VA is a constant on p and q. It is easy to check that EB{ = for 
1 < i < 4,' Co\{Bi,Bj) = for all 1 < i / j < 4, and Cov(5i,tr(A"A")) = 
for i = 2,3,4. Also, each B t is a sum of uncorrelated random variables. 
Therefore, 

Var(tr((X'X) 2 )) = (^Var^)) + 4(p + q - 2) 2 Var(tr(X'X)) 



u=l 



+ 2Cov( J Bi,tr(X'X)). 

Now it is easy to verify that Cov(J3i, tr(X'X)) = 0(p 2 ) and Var(J3;) = 
0(p 3 ) for i = 1, 2, 3 as p — > oo. Moreover, Var(i?4) = pq(p — l)(q — 1) and 
Var(tr(X'X)) = 2pq. Combining these quantities together, we obtain (i). 
(ii) By (4.5) again, 

Cov(tr(X'X), tii(X'X) 2 )) = Cov(tr(X'X), B x ) + 2(p + q - 2) • Var(tr(X'X)) 

~ Apq(p + q) 

as n — > oo. □ 

4.2. The proofs of lemmas used in Section 3. Before the proof of these 
lemmas, we need some preliminary results for a preparation. 

Lemma 4.1. Let Ei, i = 0,1,2, ... ,n, be events in a probability space 
(n,F,P). Then 



P[r\Ei) -P(E )+J2P(Eo\Ei) 



u=0 



i=l 



l<i<j<n 



PROOF. First, P{E ) - P(C\7 =0 Ei) = PflJiU E \Ei). By Bonferoni's in- 
equality, it is bounded above and below, respectively, by 

n n 

Y,P(E \Ei) and ^P(E \^)- £ P((E \Ei) f] (Eo\Ej)). 

i=l i=l l<*<i<" 
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Note that (E \Ei) n (E \Ej) C EfEp The desired conclusion follows. □ 

Lemma 4.2. Let {£i;i> 1} be a sequence of i.i.d. random variables with 
the standard normal distribution. Set Sk = J2i=i £i ■ Then 



P 



Sn 



n 
m 



> x I < 6 exp 



4 2 

m x 
' 48n 3 



/or any m > 1, n > 1 and x > satisfying m < n/2 and x < n/m. 
Proof. Write 



Then 



5Vt 
c 

Srn. 



n 
m 



(m - n)(S m - m) + m[(5 n - 5 m ) - (re - m)] 
m5 m 



rri 



< 



n 



max{|5 m -m|, \{S n - S m ) - (n - m}\}. 



mS m 

Since the distribution of S n — S m is equal to that of S n - m , we have that 
S„. n 



P 



Sri 



rn 



(4.6) 



>x 



<P S m <-)+P \S, 



ml > 



m 2 x 
~2n 



+ P[\Sn 



(n — m)\ > 



m?x 
2re 



Let P\,P2 and P3 stand for the previous three probabilities in order. Define 
I{x) := sup egK {0x — log(^exp(0^))} for x G R. It is not difficult to verify 
the following: 

(i) I{x) = (x — 1 — logx)/2 for x > 0; /(x) = +00 for x < 0; 

(ii) /(x) is increasing on [1, 00) and decreasing on (0,1). 

The above two facts can be also seen in Lemma 3.2 from [19]. By (i) of 
Lemma A. 3, 

Pi < 2e _m/(1/2) < 2exp(-(log4 - l)m/4) < 2exp(-m/12). 

Define rj(x) = x — log(l + x) — (x 2 /3) for x > — 1. Then re(0) = and rj'{x) = 
x(l - 2x)(l + x) -1 / 3 - Hence, re'(x) > for x G [0,1/2] and rf(x) < for 
x E [-1/2, 0). It follows that x - log(l + x) > x 2 /3 for |x| < 1/2. Therefore, 



P2 < 2exp< — m ■ max< /( 1 + 



mx 
~2n 



rrex 
~2n 



< 2e 



-m 3 a; 2 /(24n 2 ) 
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provided x < n/m, where property (ii) of I(x) above is used. Similarly, 

Sn—m 



ft <P 



1 



re — to 



> 



2n 2 



< 2exp| — (n — to) • max|/^l + 

< 2e _m4x2 < / ( 48n3 ) 



TO 2 X 



m?x 
~2n Y 



provided m < n/2 and x < n 2 /m 2 , where the fact that n — to > n/2 is used 
in the last step. Thus, 



P\ + P2 + ft < 6 exp — min 



3 2 4 2 

m to x m x 



12' 24n 2 ' 48n 3 



if ?n < n/2 and x < n/m. By a simple verification, the minimum above is 
actually m 4 x 2 /(48n 3 ). This together with (4.6) proves the lemma. □ 



Proof of Lemma 3.1. Write m = n a for simplification. By (3.4), we 
know that 

max Hlv^Tj - Yj + ^j\\\ < max |||>/nuj|||, 
where Uj = (1 — n _1 / 2 ||wj||)7 •. By the triangle inequality, 



En (m) — max III Ad 

2<j<m 3 



< max m-v/raujl 

l<j<m 



< < max |||\/n'Y 1 -| 
\l<j<n 





llwdl 2 


> • max 


J l<7<m 


n 



where the inequality |1 — y/x\ < |1 — x\ is used in the last step. Proposition 
1 from [19] implies that 



n p 

max IH7JII — > 2 



logn i<j<™ 111 3 
as n — > 00. To prove the lemma, it suffices to show that 



(4.7) 



B n := ydogn max 

l<j<m 



1 



Wo 



0, 



in probability as n — > 00. By orthogonality, (/„ — r nj T^j) 2 = I n — TnjT^ j. 
This says that I n — T n jT^ j is an idempotent matrix. So by (3.4), Wj ~ 
N n (0,I n - T n jT^ d ) conditioning on yi,y 2 , . . -,yj-i, where T n> j = (71, 7 2 , 
• • • ,7j_i). In this context, "~" means that both sides of "~" have the same 
probability distribution. It also follows that rank(/ ra — T n jT^j) = trace(/ n — 
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r r 



trace(/ n ) — trace (r nj T^ •) = n — j + 1. By Lemma A. 2, ||wj 



n i3 n,jj 



X 2 ( n ~ J ' + !)• Obviously, 2tn/y / Iogn — j > tn/y/logn for all 1 < j < m, as 
n is sufficiently large. Let {£i,£2j • • • >£n} be independent standard normals. 
Then 

2 



1 



Wj 



> 2i(logn)' 



-1/2 



(4.8) 



<P 



<P 



n-j+l 

E fe 2 - 1: 

k=l 

1 



> 



tn 



n 



n-j+l 

E fe 2 - 1: 
fc=i 



>n 1 / 3 



< exp(-n 1 / 3 ), 

uniformly for 1 < j < m as n is sufficiently large, where Lemma A. 3 is used 
in the last inequality [heuristically, since X)fc=i +1 (£fc — 1) is a sum of i.i.d. 
random variables with mean zero and variance equal to two, one can think 
of J2k=i +1 (^k ~ — J + 1 as a normal. Then the last inequality above 

is intuitive]. By the union bound, 

/ Ilw.|l2 

P{B n >2t)<n- max P 

l<j<m 



n 



> 2t(logny 1/2 ^j < n ■ exp(-n 1/3 ) -> 



as n — ► oo. So (4.7) follows. □ 

We need the following two lemmas for the proof of Lemma 3.2. 

Lemma 4.3. Let Aj be as in (3.3) and n a in (3.8). Write Aj = (Ay, 
A 2 j,..., A nj ) T G M n . T/ien, for any t > 0, 

P(|Ay| > t, |A 2i | > t) < e" i2n / J ' + e -0°s») 2 /ii } 

uniformly on j G (n/(logn) 3 ,n a ) as n is sufficiently large. 

Proof. Again, write m = n a . By (3.4), Aj = r nj T^jyj, where T n j = 

(7i,7 2 , • • ->7j-i) and Yj = (yij,V2j, ■ ■ -,ynj) T G K n . It is easy to see from the 
orthogonality of the j^s and the independence between and T n j that 



(4.9) 



A i =r„j(j/y,Wy, ij) J 



Here and later, the notation "=" means that the distributions of both sides 
are identical. Thus, 

/j-i i-i \ T 

(4.10) (Ay, A 2j ) T = I E 7ifc3/fcj > E 72fcl/fcj • 



\k=l 



k=l 
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Observe that 7i,7 2 , • • • , 7j-i are functions of yi,y2, • ■ • , Yj-i- We know from 
(4.10) that (Aij,A 2 j) T ~ iV 2 (jLt, E) conditioning on yi,y 2 , . . . ,y 3 -_i. Easily, 

of Ay and A 2 j is 

Efc=i 7ifc72fc 



= and Var(Apj) ~ J2k=i ^pk f° r P = ^> ^' anc ^ the correlation coefficient 



(4.11) Pi : = 



Dfc=iTffcVS*=iT2fc 



Therefore, there exists two independent standard normals £ and rj such that 
the conditional distribution of Ay and A 2j - given yi, y 2 , . . . , y/_i is the same 

as that of (E£l7i 2 fc ) 1/2 e and (£fc} 7 2 2 fc ) 1/2 (M + ^/wf^)- Jt follows that 
P(|Aij+i| >i, |A 2i+ i| > 1 1 yi,y 2 ,...,yj) 

(4-12) <P^I>t(£7i fc ) 7 , 

-1/2 

yi,y2,---,yj 



|??|>^E72 2 fcJ -\Pj+it\ 



Now, by (3.5) and (3.6), there exists a sequence of i.i.d. standard normals 
4>i,ip2, ■■■ ,ipn such that £(Ei=i7pfc) =£(Sj/S n ) for p= 1,2, where Sj = 
2~2j=i V'; 2 - By Lemma 4.2, 

max P[ 

n I (log n) 3 <j <m 

<6 max {expf- J " / U < e~^, 



( 




77 


>n~ 1/5 ) 











n/(logn)3<i<ml eXP V 48n 3 



as n is sufficiently large. By (4.12), 
P(\A lj+1 \>t, |A 2i+1 | >t) 



(4.13) <Pm>t\J(n/j)-n-^, \ V \ > tyj(n/j) - - \p j+1 £\) 

+ 2e^ 1 . 

Since P(\£\ > x) < (l/x) exp(— x 2 /2) for any x > 0, by Lemma 4.4 below, 
P(h-+l£| > (lognf/n 1 / 4 ) < pf > Oog^^ + p( | e | > bgn) 



< 2e -(logn) 2 /10 
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for sufficiently large n. Thus, combining this with (4.13), we obtain from the 
independence of £ and rj that P(|Ai J+ i| > t, |A2j+i| > t) is bounded above 
by 

P(\i\>t^(n/j)-n-^>, \ n \ > ty/fa/j) - - rT^logn) 7 ) 

A B 
+ 3e -(logn) 2 /10 

< 2e~ t2n ^ + e -( lo s™) 2 /H ) 

uniformly on j € (n/(logn) 3 ,m) as n is sufficiently large, where A and B 
are essentially ty/n/j when using Lemma A.l in the last step. □ 

Now we measure how fast the correlation coefficient pj goes to zero. The 
idea behind the proof is that we view 7y's in the expression of pj in (4.11) 
as independent normals with mean zero and standard deviation n -1 / 2 . This 
intuition will be carried out rigorously by using Lemma A. 4. 

Lemma 4.4. Let pj be as in (4.11). Then 

P(\p j+ l\ > (logn) 6 /n 1/4 ) < e- (logn)2/10 , 
uniformly on j G (n/ (log n) 3 ,n a ) for sufficiently large n. 



Proof. Write m = n a for simplification. Note that (711,712, ••• )7in) 
has the same distribution as that of (721,722, • • • ,72n) because of the Haar 
invariance of T = (71,72, • • • ,7 n )- For any a > 0, 



(4.14) P(\p J+1 \>a)<P\ 



7ifc72fc 

k=l 



> 



aj_ 
2n 



7ifc 



> 



2n 



By (3.5) and (3.6), the sum appearing in the last probability in (4.14) is 
equal to Sj/S n in law as in Lemma 4.2. By this lemma, 



P 



j 

I 

\k=l 



£7i 2 fc 



>-\=P 



v 



n 
> - 



(4.15) 



< 6exp 



48n 3 V j 



uniformly on j G (n/(log n) 3 ,m) for n sufficiently large. Recall (3.6) again. 
Choosing m = 2, t = ?i -1 / 4 log n, s = logro and r = (logn) 2 jyfn in Theorem 
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A. 4, by (3.6), we have 2n 2 i.i.d. standard normals {yij] 1 < i < 2, 1 < j < n} 
such that 

^2 



/ . . (logn) 
P(en(2)>^ 



< 4n exp 



(4.16) 



(logn) 
16 



+ 3n 2 e-^") 2 / 2 + 3n 5 / 4 ( 1 + —2^ 



V 3^(^ + 2) 



-n/2 



< e -(logn) 2 /9 

for n large enough, where e n {2) = maxi<i<2 t i<j< n \V™lij — Vij\- Notice that 

(logn) 2 2 ^ 2j(logn) 4 



(4.17) n 


i 


< 




+ 




fe=l 




k=l 





I? 



1/4 



i=l k=l 

on {e n (2) < (logn) 2 //! 1 / 4 }. Note that Eexp(\y u y 2 i\/8) < oo and E\yu\ < 1. 
By Lemma A. 3, there exists a universal constant C > such that 



(4.18) 



^(EZ>i*l >3j)<e^ Ci and 
\i=lfc=l / 

> V^logil < e- (1 ° sn)2/3 , 



f( ^yiky? 
\ k=i 

uniformly on j G (n/(logn) 3 ,m), where the first one comes from (i) of 
Lemma A. 3 and the second is obtained by (ii) of Lemma A. 3 in the same 
way as in (4.8). If neither of the events in the above two probabilities occurs 
and e„(2) < (logn) 2 /?! 1 / 4 , then from (4.17) 



E 7lfc72fc 



r 3j(logn) 2 2j(logn) 4 3/4 2 
< VJ lo SJ H 771 1 7= — < 5n a/ *(lognT, 



n 



1/4 



uniformly on j G (n/(logn) 3 , m) for sufficiently large n. Thus, from (4.16) 
and (4.18), 



P 



E 7ifc72fc 
fc=i 



> 



5 (logn) 



1/4 



< 2e 



-(logn) 2 /9 



as n is sufficiently large. Choose a = (logn) 6 /n 4 / 4 in (4.14). Then, aj/(2n) > 
5(log?i) 2 /n 1//4 for all j £ (n(logn) -3 , m), as n is sufficiently large. It follows 
from the above that 



(4.19) 



P 



E 7ifc72fc 
fc=i 



>^ < 2e"( 10 ^) 2 / 9 , 
2n / 
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uniformly on j G (n/(logn) 3 ,m) as n is sufficiently large. It is easy to see 
that the last probability in (4.14) is bounded by the first probability in 
(4.15). Combining (4.14), (4.15) and (4.19) together, we obtain that 

P(|p i+ i| > (logn) 6 /n 1/4 ) < 2e"( logn ) 2 / 9 + 2e-v^ < e -(^sn) 2 /io^ 

as n is sufficiently large. □ 

Proof of Lemma 3.2. Write m = n a . Rewrite A fc+ i = (Ai^+i, A 2 ,fc+i, ■ ■ ■ 

A n>fc+1 ) T e R n . By (4.9) and (4.10), C(A i>k+1 ) = iC(Eti7«Iftfc+i). so condi- 
tioning on yi,y 2 , . . . ,y fc , 



(4.20) 



Ai.fe 



+i 



Let Eq = {max2<j<fc ||| Aj||| < t} and Ei = {|Aj ; /u + i| < t}. Although each Ei 
depends on n and k, we would rather use the notation Ei for simplification. 
This will not cause confusion in the context. Evidently, 



(4.21) 



max III A ~ III < t 

2<j<fc+i 



i=0 



To apply Lemma 4.1, we now calculate P(Eo\Ei). Define 



max 

(i,Z)efi n 



E 

\j=i 



-1/2 



7ij 



where 



0, 



{(M); 1 < ^ < n 7 ?T,/(logn) 3 < / < m}. 



Recall (4.20). Let Sj be as in Lemma 4.2, then by the lemma and the fact 
that \yfa — y/b\ < \a — b\ if a > 1, 



(4.22) P K> 



(logra) 



n 



< n 2 max P 



Sn_n 
Si I 



> 



(logra) 



n 



for sufficiently large n, where the max above is taken over all I such that 
ra/(logn) 3 < I <m. By (4.20), for some standard normal £, we have Aj^+i ~ 
(Ej=i 7ij) 1/2 £ conditioning on yi, y 2 , . . . , y k . Thus, P(Ef | yi,y 2 , . . . ,y k ) = 
P{\£\ > (E;=i7* )- 1/2 i I yi,y 2 , . . . ,y fc ). It follows that on {5 n < (logn) 8 /v^}, 



/+(*) = P |£|>t 



fn (logn) 8 
k \fn 



(4.23) 



<P(.Ef |yi,y2,...,y fc ) 

fn (\ognf 



<P[\£\>t 



n 



fn(k), 



NORMALS APPROXIMATE MATRIX ENTRIES 31 

uniformly on (i, k) G f2 n . The key observation for this proof is that the above 
conditional probability is bounded above and below by unconditional prob- 
abilities. Obviously, Eq is a set in the <r-algebra generated by yi,y2, • • • ,Yk- 
By (4.22) and (4.23), 



P(E Q \E i ) = E{I Eo (P(Ef | yi ,y 2 ,...,y k ))}<P(E )f-(k)+e 



-(logn) 2 



for all (i, k) 6 Q n when n is sufficiently large. Similarly, use the first step 
above to obtain 

P(Eo\Ei) > P(E n F n ) ■ f+(k) > P(Eq) ■ f+(k) - e~^\ 

for all (i,k) £ Q n , where F n := {5 n < (logn) 8 /y/n}. Therefore, 

n 

nP(E ) ■ f+(k) - ne~^ 2 < £ P(£ W 

i=l 

(4.24) 

<nP(E )-f-(k)+ne-^ 2 , 

uniformly on n/(logn) 3 < k < m as n is sufficiently large. 

Finally, note that e~* n l'- s is increasing in j. By Lemma 4.3, P(EfE2) < 
n ~t / a (\ogn) c for some constant C > as n is sufficiently large. Also, the 
n random variables in (Ai^+i, A 2i fc+i> ■ ■ ■ , ^n,k+i) are exchangeable by the 
Haar-invariance. Hence, 

(4.25) £ P(EfE<) < ^ P(EZEt) < 

l<i<j'<n 

as n is sufficiently large. By (4.24), the quantity P(E ) - Ya=i P(E \Ei) is 
bounded above and below respectively by 

(l-nf+(k))P(E )+ne- ( - losn ^ and (1 - nf~(k))P(Eo) - ne _( ^ logn ^. 

This together with (4.25) yields the desired conclusion via Lemma 4.1. □ 

APPENDIX 

The following is a standard result. It can be found in, for example, Lemma 
3 on page 49 from [6]. 

Lemma A. 1. Suppose X ^ N (0,1). Then 



y/2TT 1 + X 2 y/2TT X 

for all x > 0. 

The following lemma is part (ii) on page 186 from [29] . 
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Lemma A. 2. Suppose y is an W 1 -valued random vector with multi- 
normal distribution with mean and covariance matrix X of rank r. IfS 2 = 
Jj, then there exists a sequence of independent standard normals {£,■; j = 
1,2, ... ,n} such that ||y|| 2 has the same distribution as that o/X^=i£f j that 
is, ||y|| 2 ~X 2 (r)- 



For vlcl, the interior and the closure of A in R are denoted by A° and A, 
respectively. The following are Chernoff's bound and a moderate deviation 
result. They can be found from, for example, (c) of Remarks on page 27 
from [9] and Theorem 3.7.1 on page 109 from [9]. 

Lemma A. 3. Let {X, Xi,i = 1,2, . . .} be a sequence of i.i.d. random vari- 
ables. Let S n = Ya=i Xi,n>l. Then: 

(i) For any AcM and n > 1, 

P(S n /n£A) <2e~ nI ( A \ 

where I{x) = sup tgR {fx — logE(e tx )} and 1(A) = 'm.f x& A I(x). 

(ii) Assume further that EX = 0, var(X) = a 2 > and Ee l ° x < oo for 
some to > 0. Let {a n ;n = 1,2,...} be a sequence of positive numbers such 
that a n — > and na n — > oo as n — > oo. Then 



lim a n logP \ — S n G A I = — inf 



n 



xeA[2a 2 J 



for any subset Act such that inf{ |x| ; x 6 ^4°} = inf{|x|; x G ^4}. 
The following lemma is Theorem 5 from [19]. 

Lemma A. 4. For each n>2, there exists matrices T n = (jij)i<ij<n o» n d 
Y n = {yij)i<i,j< n whose 2n 2 elements are random variables defined on the 
same probability space such that: 

(i) the law of T n is the normalized Haar measure on the orthogonal group 

O n ; 

(ii) {yij', 1 < i,j < n} are i.i.d. random variables with the standard normal 
distribution; 

(hi) set e n (m) = maxi<j< ni i<j< m Wnjij - Uij\ for m = 1,2, ... ,n. Then 

(1 1 / \ ~ n /^\ 

-e~ s2 / 2 + - 1 + - ^r) 
s t\ 3(m + ^n)J J 

for any r £ (0, 1/4), s > 0, t > 0, and m < (r/2)n. 
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